Predicting intraoperative hypotension using deep learning with waveforms of arterial blood pressure, electroencephalogram, and electrocardiogram: Retrospective study

To develop deep learning models for predicting Interoperative hypotension (IOH) using waveforms from arterial blood pressure (ABP), electrocardiogram (ECG), and electroencephalogram (EEG), and to determine whether combination ABP with EEG or CG improves model performance. Data were retrieved from VitalDB, a public data repository of vital signs taken during surgeries in 10 operating rooms at Seoul National University Hospital from January 6, 2005, to March 1, 2014. Retrospective data from 14,140 adult patients undergoing non-cardiac surgery with general anaesthesia were used. The predictive performances of models trained with different combinations of waveforms were evaluated and compared at time points at 3, 5, 10, 15 minutes before the event. The performance was calculated by area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC), sensitivity and specificity. The model performance was better in the model using both ABP and EEG waveforms than in all other models at all time points (3, 5, 10, and 15 minutes before an event) Using high-fidelity ABP and EEG waveforms, the model predicted IOH with a AUROC and AUPRC of 0.935 [0.932 to 0.938] and 0.882 [0.876 to 0.887] at 5 minutes before an IOH event. The output of both ABP and EEG was more calibrated than that using other combinations or ABP alone. The results demonstrate that a predictive deep neural network can be trained using ABP, ECG, and EEG waveforms, and the combination of ABP and EEG improves model performance and calibration.


Introduction
Intraoperative hypotension is conventionally defined as a drop in mean arterial pressure (MAP) to less than 65 mmHg during surgery, [1] which is associated with postoperative myocardial infarction, acute kidney injury, and postoperative mortality [2,3]. Factors affecting blood pressure drop during surgery are multifactorial, including bleeding during surgery, anaesthetics, underlying illnesses, and certain preoperative medications [4,5]. If intraoperative hypotension could be identified in advance, it could then be addressed quickly and prevented using various measures. This would ultimately improve the postoperative patient outcome. Recently, a randomized clinical trial used a machine-learning-derived early-warning system with certain parameters derived from the arterial pressure wave, resulting in less intraoperative hypotension [6].
The electrocardiogram (ECG) waveform is the most monitored biosignal in operating rooms and intensive care units. Cardiac-rhythm disturbances, myocardial ischemia, and electrolyte disturbances, which can lead to intra-operative hypotension, are reflected in ECG monitoring [7]. Additionally, during surgery, some patients undergo two-or four-channel electroencephalogram (EEG) monitoring procedures, such as the bispectral index measurement, to monitor the depth of sedation or anesthesia [8]. Deep anaesthesia can cause intraoperative hypotension via the substantial inhibition of sympathetic activity, and it can reduce myocardial contractility and systemic vascular resistance. Furthermore, several studies have shown that low bispectral index values during surgery are also associated with postoperative mortality [9][10][11][12].
Most previous studies on machine-learning derived early warning algorithms are limited by relying only one a single data source such as arterial pressure waveforms or photoplethysmographs [13][14][15]. However, hemodynamic changes are also associated with alteration in physiological profiles, including ECG and EEG [16]. In addition, early warning models based on arterial blood pressure (ABP) waveforms necessarily require extensive feature engineering based on proprietary algorithms. In the present study, we hypothesize that combining ABP waveforms with EEG or ECG may better predict intraoperative hypotension. In this study, we trained a deep neural network using waveforms of ABP, ECG, and EEG during surgery without specific hemodynamic parameters derived from a proprietary algorithm.

Objective
The objective of this study was to develop a model to predict intraoperative hypotension using ABP, ECG, and EEG waveforms. Intraoperative hypotension is defined as a drop in mean arterial pressure (MAP) of less than 65 mmHg.

Dataset
Data were retrieved from VitalDB, a public data repository of vital signs taken during surgeries from January 6, 2005, to March 1, 2014. A data-recording software (Vital Recorder 1.7.4; https://vitaldb.net/vital-recorder) was used to collect the data from 10 operating rooms at Seoul National University Hospital [17]. The inclusion criteria were as follows: (1) adults (age�18); (2) administered general anaesthesia; and (3) undergone non-cardiac surgery. The exclusion criteria were as follows: (1) any missing monitoring for ABP, ECG, and EEG waveforms; and (2) cases containing false events or non-events due to poor signal quality shown as

Data selection/Pre-processing
We defined a hypotensive event as a 1-min interval in which a patient sustains a MAP of less than 65 mmHg during surgery. We defined the section in which hypotensive events appear in succession as a hypotensive segment. However, the MAP used to define the event may contain an outlier in some cases because the ABP waveform is noisy and it caused to generates false event or non-event. To remove unreliable case, we used an algorithm named j signal quality index (jSQI) [18], which calculates the ABP waveform's noise score. Events and non-event cases extracted from noisy ABP waveforms with jSQI 0.8 or less, were excluded. We sampled patient events at~20-min intervals to minimize potential residual effects from the previous event (see S1 Fig). Then, ABP, ECG, and EEG waveforms of 1-min intervals were collected at 3, 5, 10, and 15 min before each event. For sampling non-events, 30-min segments, in which MAPs per minute was maintained above 75 mmHg, were first extracted, and three samples of each waveform of 1-min intervals were obtained in the middle of the segment. The sampling rates for ABP, ECG, and EEG waveforms were 500, 500, and 128 Hz, respectively. After applying filters with various range of filtering, we decided on the following pre-processing methods. ECG waveforms were pre-processed with a 1-40-Hz band-pass filter and normalized using Zscore. EEG waveforms were pre-processed using a 0.5-50-Hz band-pass filter. ABP waveforms were used without pre-processing. We split the data based on the number of cases into training, validation, and test datasets in a 6:1:3 ratio, while preventing the distribution of samples derived from a single case into different datasets. The number of cases in Table 2 at every time point is not equivalent. This is because, when a hypotensive event occurs at the beginning of the signal data (within 15 minutes), sampling of ABP, ECG, and EEG waveforms is possible only at some time points among the data segments 3, 5, 10, and 15 minutes before the event, depending on the timing of the occurrence.

Model development
In this study, we developed a model based on ResNet [19], which is a popular one in image classification models. ResNet solves the gradient vanishing problem as layers deepen through residual learning using skip connections that add the input to the output after several layers. We tuned the ResNet for our purpose to extract important features from each waveform. Our model consists of three ResNets and one classifier. Fig 2 shows an example of our model architecture. Each ResNet has a single encoder block, multiple residual blocks, and a fully connected layer. The encoder block consisted of a convolutional neural network (CNN) with a max pooling function. CNN is a deep-learning method for analyzing image data. It can learn to extract positional and morphological information from data using filters, which are parameters that extract features by moving at assigned intervals called strides. One-dimensional data such as audio and biosignal data can also be analyzed with one-dimensional CNN used in this study that was designed for one-dimensional sequential data. Each residual block has two sets of CNNs with batch normalization, dropout, and rectified linear-unit activation functions. The input and output of the residual block are summed by skip-connection, which is known to improve the efficiency of a training model consisting of many layers. Because we use a combination of waveforms to predict intraoperative hypotension, the models have either a single network or multiple networks depending on the number of biosignals. The output of each network is concatenated and is passed to a classifier with two fully connected layers and a sigmoid function for predicting intraoperative hypotension using concatenated outputs derived from every network.
Each ResNet contains 12 residual blocks and one linear layer. Detailed hyperparameter settings were described in S1 Table. We set that the output size of ResNet for each biosignal is 32. In our models, Binary cross entropy is used as the loss function, and Adam is used as the optimizer. Through some preliminary experiments, the filter size of each waveform ABP, ECG, and EEG was set to be 15, 15, and 7, respectively. Stride was set as one for all CNN layers, the dropout was 0.5, and the learning rate was 0.0001. We set the epoch to 100, and stop the process of training the model early when there was no loss reduction on the validation set over five epochs. We choose the final model with the lowest loss. The model at 68th epoch in the training step is selected. We have a limitation for the hyper parameter tuning due to extensive experiments and computing power.

Experimental setup
The performance of the model was evaluated using the test dataset. AUROC, AUPRC, sensitivity, and specificity were calculated. The optimal cut-off value was calculated to minimize the difference between the sensitivity and specificity. Confidence intervals (CI) for each value were calculated using the exact binomial confidence limits. Our task was performed on a workstation equipped with an Intel Xeon Silver 4114 processor, 128-GB RAM, and two NVIDIA RTX TITAN graphics processing units.

Correlation analysis between hypotension prediction and actual occurrence
All model-output values for the event and non-event samples in the test dataset were gathered and segmented into model-output bins. In each bin, the percentage of event samples was the rate of events occurring in given time.
Post hoc analysis for model comparison. We developed and tested our model by sampling hypotensive events with at least 20-min intervals to exclude the residual effect of the previous hypotensive event. To compare the performance of our model with that previously reported by Hatib et al. [14], we adopted the same sampling strategy, which captures every hypotensive event in the test dataset. Then, we calculated and presented AUROC and AUPRC in our models using only ABP and a combination of ABP and EEG.

Dataset characteristics
Among the eligible cases, the mean age was 58.7 years. Males comprised 50.6% (n = 6,850) of the cases. Total anaesthesia duration was 240±113 min, and total surgery duration was 180 ±105 min. Emergent operation consisted of 7.2% (n = 969) of surgeries. 16% (n = 2,061) of cases were American Society of Anesthesiologists (ASA) classifications of three or more, which means severe pre-anaesthesia medical comorbidities occurred. The patient's characteristics were comparable between the training and testing datasets in Table 1. There is no significant difference among train, test and validation datasets except weight (p-value < 0.05).
The number of cases and ABP, ECG, and EEG waveforms that were split into training, validation, and test sets are described in Table 2. In brief, waveforms were sampled 3, 5, 10, and 15 min before the occurrence of hypotension for events and were sampled in the middle of 30-min segments of normal arterial pressure for non-events. The data below are expressed as mean±SD (

Hypotension prediction performance depending on a set of combinations of waveforms
The prediction performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity at various time points and waveform compositions. Among models using a single waveform, ABP was much better than ECG or EEG when predicting intraoperative hypotension. Of the models using multiple waveforms, the combination of ABP and EEG waveforms was generally the best, outperforming models that used only ABP in Table 3 (Also, S2 Table). The representative attention map of the model using both ABP and EEG is provided in S2 Fig. Our model generally performs better when predicting the recurrent events than the first occurring events (S3 Table).  MAP was stable in the 70-to 85-mmHg range, the HRI required to predict the event 3-min later using ABP (HRI ABP 3 min) increased sharply from approximately 20 to 95, which was 3 min before the hypotensive event. This remained until the event occurred. The HRI using both ABP and EEG (HRI ABP+EEG 3 min) showed similar but more fluctuating predictions. The model outputs predicting hypotension 15 min before showed more robust prediction with both ABP and EEG (HRI ABP+EEG 15 min) than with ABP only (HRI ABP 15 min), which had higher values and stiff increments over 95% around 10-min before the hypotensive event.    Post hoc analysis for model comparison. We further tested our model using same-event sampling methods, previously described by Hatib et al. [14], upon which the proprietary Hypotension Prediction Index algorithm is based. In our main analysis, we sampled hypotensive events with at least a 20-min interval to exclude the residual effect of the previous event. In the post hoc analysis, we rebuilt the test set by capturing every hypotensive event at 1-min  Table). Our model demonstrated performance measures similar to the Hypotension Risk Index. The addition of EEG to ABP also improved the model performance at all time points (see Table 4 and S6 Table). The model calibration analysis showed nearly linear associations between model outputs and rates of occurrence of intraoperative hypotension (see S3 Fig).

Discussions
We developed a deep-learning model to predict intraoperative hypotension from different sets of combinations of ABP, ECG, and EEG waveforms using high-fidelity monitoring data taken over 3-million min from 14,140 patients. The model performance was better in the model using both ABP and EEG waveforms than in all other models at all time points (3, 5, 10, and 15 min). The combination of ABP and EEG waveforms was also beneficial when calibrating model outputs to better reflect actual occurrences of hypotensive events.
Several studies have developed models based on machine learning for the prediction of hemodynamic instability. Lin et al. developed an artificial neural-network model that predicted postinduction hypotension with 82.3% accuracy [20]. Noninvasive features used in the model were commonly used standard variables and were readily retrievable in all anaesthesia records, including 11 patient-related, 2 surgical, and 5 aesthetic variables. Convertino et al. applied novel feature-extraction and machine-learning techniques to plethysmography waveforms to identify patients who were developing cardiac instability [21]. Unfortunately, these approaches do not allow for real-time prediction of hemodynamic instability. In a recent work, Chen et al. predicted hypotension in hemodialysis patients using deep learning [22]. They utilized demographic, clinical, and laboratory data to develop a prediction model. The AUROC in this study was only 0.65, which suggests that it is required to use waveform data for predicting hypotension event using deep learning. Moreover, Davies et al. verified the Edwards Hypotension Prediction Index software in which the AUROCs were 0.92, 0.89, and 0.88 at 5, 10, and 15 minutes before hypotensive event, respectively.
The Hypotension Prediction Index (a commercial hypotension prediction algorithm) calculates various hemodynamic parameters and their combinations and uses them as model features for machine learning. However, for calculating the Hypotension Prediction Index, algorithms incorporated in commercial sensors (e.g., FloTrac and CO-Trek) should be used. In contrast, we utilized a deep-learning architecture that automatically extracts diverse features from waveform data, demonstrating good performance when predicting hypotension. The predictive ability of our model is similar to that seen in the derivation cohorts of the proprietary Hypotension Prediction algorithm [14]. Another machine-learning algorithm was constructed to predict the occurrence of hypotension within 10 min after induction of general anaesthesia [23]. Using a combination of variables, such as comorbidity, preoperative vital signs and medication, the AUROC was 0.76 with a gradient boosting model. Although our algorithm used a more conservative definition of hypotension (<65 mmHg vs. <55 mmHg) and did not depend on other clinical parameters, the predictive performance 10 min before IOH was excellent with AUROC 0.898. This suggests deep learning approaches can benefit from detection of clinically imperceptible and subtle changes from multiple heterogeneous biosignals.
In this study, adding EEG to the ABP waveform improved the model performance at all time points. Intraoperative hypotension is caused either by (a combination of) a reduction in either cardiac preload or afterload or by an impairment in cardiac contractility. Deeper levels of anaesthesia may make patients more susceptible to intraoperative hypotension by reducing sympathetic tone, myocardial contractility, and vascular resistance [23,24]. On the other hand, EEGs reflect cerebral blood flow and recognize subclinical brain ischemia [25][26][27]. Thus, hemodynamic instability before blood pressure drop can cause relative hypoperfusion of the brain and may result in subtle changes in EEG activity. This effect can be more prominent when using volatile anaesthetics, which can impair cerebral blood flow autoregulation [28].
Using both EEG and ABP provided more calibrated estimates of occurrence of hypotension than using ABP alone. Although recent advances in deep learning have improved neural-network accuracy, modern neural networks are usually not well-calibrated [29]. A real-time decision-making system, including hypotension prediction, should provide a calibrated confidence measure and its prediction, because final clinical decisions should be made by doctors in charge. The effect of adding EEG to the ABP on calibration was more prominent in the range of model output between 45 and 90, which is above the thresholds of binary classification.
The addition of ECG did not provide additional improvement in the model performance compared with the model using only ABP. In contrast to that from the EEG, most hemodynamic information from the ECG could also be reflected in the ABP waveforms. The electrical waves of the heart obtained through two skin electrodes may not have been more informative than the arterial pressure waves gathered directly within the vessels. However, considering that the subjects in the present study were people who had non-cardiac surgery and most of them had elective surgery, the importance of ECG is likely to be higher in other populations with high rates of emergency operation or high risk of intraoperative myocardial infarction.
Although our algorithm was trained with data from anesthetized patients, this approach could be applied to patients with critical illness such as septic shock and coronavirus disease of 2019 (COVID-19) with further validation and fine-tuning. Hypotension and shock have been observed in a subgroup of patients with severe COVID-19 with possible cytokine storm syndrome [30]. If the risk of hypotension can be predicted with our algorithm, several therapeutic measures, including vasopressors, steroids and novel immunotherapies may be applied earlier and improve outcomes in patients with severe COVID-19.
This study had several limitations. Prospective and external validation are required because validation was performed only with retrospective data and the model may be overfitted in a specific pattern shared at that time. We used data exclusively from anesthetized patients to train our model, which may limit its performance in other clinical settings, such as an intensive care unit. As two-lead ECG and one-channel BIS-EEG waveforms were used in our model, high resolution EEGs or multi-lead ECGs may provide superior performance to our model. For the sake of accuracy, we inevitably built our models based only on data of clear hypotension (MAP < 65 mmHg) and non-hypotension (MAP > 75 mmHg) data, as two easily separable and mutually exclusive label is important for dichotomous classification. Events were extracted at minimum intervals of 20 min, but it is possible that the residual effects of preceding hypotensive events were further learned at waveforms predicting recurrent events. Also, we extracted features from all waveforms by only applying a 1D residual network and used those features to the IOH prediction. As more important features can possibly be extracted by using specific networks for different waveforms, it is further required to learn with different network architectures that is suitable to individual waveforms. These approaches may improve the performance of deep learning models to predict IOH.

Conclusion
Our deep-learning model trained with waveforms of ABP, EEG, and ECG demonstrated good performance in predicting intraoperative hypotension in patients undergoing non-cardiac surgery. Without specific algorithms for feature extraction, our deep-learning model using raw ABP waveforms showed a high predictive performance. The combination of EEG and ABP may confer enhancements of model performance and calibration of hypotension risk indices.